Chapter 3 Exploratory analysis
Here you can check data fields and fields descriptions for all variables appearing in the dataset
Variables related with the trade process:
- control_number: It represents a unique individual shipment processed by the USFWS.
- quantity: It represents the numeric quantity of the wildlife produc
- unit: It represents the unit for the numeric quantity
- import_export: It represents whether the shipment is an (I)mport or (E)xport
- action: Action taken by USFWS on import ((C)leared/(R)efused)
- shipment_date: Full date when shipment arrived
- shipment_year: Year when the shipment arrived (derived from “shiptment_year”)
- disposition: Fate of the import
- disposition_date: Full date when disposition occurred
- disposition_year: Year when disposition occurred (derived from “disposition_date”)
Variables related with the countries:
- country_origin: It represents the code for the country of origin of the wildlife product
- country_imp_exp: It represents the code for the country to/from which the wildlife product is shipped
- port: It represents the port or region of shipment entry
- us_co: It represents the US party of the shipment
- foreign_co: It represents the foreign party of the shipment
Variables related with the product:
- description: It represents the type/form of the wildlife product
- value: It represents the reported value of the wildlife product in US dollars
- purpose: It represents the reason the wildlife product is being imported
- source: It represents the type of source within the origin country (e.g., wild, bred)
- species_code: It represents the USFWS code for the wildlife product
- taxa: It represents the USFWS-derived broad taxonomic categorization
- class: It represents the EHA-derived class-level taxonomic designation
- genus: It represents the Genus (or higher-level taxonomic name) of the wildlife product
- species: It represents species of the wildlife product
- subspecies: It represents subspecies of the wildlife product
- specific_name: It represents a specific common name for the wildlife product
- generic_name: It represents a general common name for the wildlife product
3.1 Variables related with the trade process:
3.1.1 control_number
It represents a unique individual shipment processed by the USFWS. A shipment refers to any container or group of containers that share a common control number, country of shipment and shipment date. Each shipment may be represented in the lemis dataset by multiple segments or rows, because the contents are derived from more than one species or type of product.
- There are 2,079,637 unique shipments containing 5,451,832 segments.
- Of those, 60% (1,265,491 unique shipments) represent single segments. It means that for 1,265,491 shipments only one species and one type of product were discovered. The remaining 40% (814,146 unique shipments) contain 4,186,341 segments.
- These multiple segments shipments contain from 2 to 492 segments. The average number of segments in this kind of shipments is 5; the median is 3.
# Dropping unused levels
data$control_number <- droplevels(data$control_number)
# Unique shipments
shipments<- data %>% dplyr::group_by(control_number) %>%
dplyr::summarise(size_segments=n()) %>% ungroup() %>%
mutate(type_segment = ifelse(size_segments==1, "single", "multiple")) %>%
mutate(type_segment=as.factor(type_segment))
shipments<- shipments %>% select(size_segments, type_segment) %>%
mutate(size_segments=as.factor(size_segments)) %>%
dplyr::group_by(size_segments) %>%
dplyr::summarise(total_shipments=n())
DT::datatable(shipments, rownames = FALSE,
caption = htmltools::tags$caption(
style='caption-side: bottom; text-align: center;','Table 1: ',
htmltools::em('Number of unique shipments based on the amount of
segments'))) %>%
formatRound('total_shipments',1) 3.1.2 quantity and unit
Quantity represents the numeric quantity of the wildlife product, while unit represents the unit of measure for the numeric quantity.
- There are 13 types of units. 94.64% of data is measured with the unit “Number” (the number of individual wildlife items).
- The shipments include around 11,5 billions of wildlife items plus 1,1 billion kg of wildlife items only measured in weight.
units <-data %>% group_by(unit) %>%
summarise(total_segments = n(),percentage=n()/nrow(data)) %>%
drop_na(unit) %>%
arrange(desc(percentage))
DT::datatable(units,
caption = htmltools::tags$caption(
style='caption-side: bottom; text-align: center;','Table 1: ',
htmltools::em('Units of measure'
))) %>%
formatRound('total_segments',1) %>%
formatPercentage('percentage',2)data$quantity<- as.numeric(data$quantity)
quantity<- data %>%
dplyr::group_by(unit) %>%
drop_na(quantity) %>%
dplyr::summarise(quantity = sum(quantity, na.rm=TRUE)) %>%
arrange(desc(quantity))
DT::datatable(quantity,
caption = htmltools::tags$caption(
style='caption-side: bottom; text-align: center;','Table 1: ',
htmltools::em('Quantity per unit'))) %>%
formatRound('quantity',1) 3.1.3 import_export
It represents Whether the shipment is an (I)mport or (E)xport. In this dataset, 100% of the data is an import.
imports <-data %>% group_by(import_export) %>%
summarise(total = n(), percentage=n()/nrow(data)) %>%
arrange(desc(percentage))
DT::datatable(imports,
caption = htmltools::tags$caption(
style='caption-side: bottom; text-align: center;','Table 2: ',
htmltools::em('Shipments: Imports and exports'
))) %>%
formatRound('total',1) %>%
formatPercentage('percentage',2)imports %>% mutate(percentage=percentage*100) %>%
plot_ly(x=~reorder(import_export, desc(percentage)), y=~percentage, color=~import_export) %>%
add_bars() %>%
layout(title = "<b>Shipments: Imports and exports</b>",
xaxis= list(title= "<b>Imports and exports</b>" ,tickangle=-65),
yaxis = list(title = "<b>Percentage</b>"))3.1.4 action
Action taken by the USFWS on import ((C)leared/(R)efused)
- 98.28% of imports were legally declared, just 1.73% was refused. We can find cleared and refused segments in the same unique shipment.
- Most of the refused shipments included items related with mammals (28%) and reptils (27%)
- Most illegal shipments are exported from Mexico (19%), Canada (9%) and China (8%).
- The country of origin of the refused shipments is mainly unknown (23%), followed by Mexico (14%), Canada (6%), Indonesia (6% )and China (6%)
action <-data %>% group_by(action) %>%
summarise(total= n(), percentage=n()/nrow(data)) %>%
drop_na(action) %>%
arrange(desc(percentage))
DT::datatable(action,
caption = htmltools::tags$caption(
style='caption-side: bottom; text-align: center;','Table 3: ',
htmltools::em('Action taken by the USFWS on imports'
))) %>%
formatRound('total',1) %>%
formatPercentage('percentage',2)action %>% mutate(percentage=percentage*100) %>%
plot_ly(x=~reorder(action, desc(percentage)), y=~percentage, color=~action) %>%
add_bars() %>%
layout(title = "<b>Action taken by the USFWS on imports</b>",
xaxis= list(title= "<b>Actions</b>" ,tickangle=-65),
yaxis = list(title = "<b>Percentage</b>"))refused <- data %>% filter(action=="Refused")
# Country of origin
refused %>% group_by(country_origin) %>%
summarise(total=n(), percentage=n()/nrow(refused)*100) %>%
arrange(desc(percentage)) %>%
top_n(15, percentage) %>%
plot_ly(x=~reorder(country_origin, desc(percentage)), y=~percentage,
color=~country_origin) %>%
add_bars() %>%
layout(title = "<b>Country of origin of refused segments (Top 15)</b>",
xaxis= list(title= "<b>Country of origin</b>" ,tickangle=-65),
yaxis = list(title = "<b>Percentage</b>"))# Country of export
refused %>% group_by(country_imp_exp) %>%
summarise(total=n(), percentage=n()/nrow(refused)*100) %>%
arrange(desc(percentage)) %>%
top_n(15, percentage) %>%
plot_ly(x=~reorder(country_imp_exp, desc(percentage)), y=~percentage,
color=~country_imp_exp) %>%
add_bars() %>%
layout(title = "<b>Country of export of refused segments (Top 15)</b>",
xaxis= list(title= "<b>Country of export</b>" ,tickangle=-65),
yaxis = list(title = "<b>Percentage</b>"))# Taxa
refused %>% group_by(taxa) %>%
summarise(total=n(), percentage=n()/nrow(refused)*100) %>%
arrange(desc(percentage)) %>%
top_n(15, percentage) %>%
plot_ly(x=~reorder(taxa, desc(percentage)), y=~percentage,
color=~taxa) %>%
add_bars() %>%
layout(title = "<b>Taxa of refused segments (Top 15)</b>",
xaxis= list(title= "<b>Taxa</b>" ,tickangle=-65),
yaxis = list(title = "<b>Percentage</b>"))# Genus
refused %>%
drop_na(genus) %>%
mutate(genus=paste(taxa,"-", genus)) %>%
group_by(genus) %>%
summarise(total=n(), percentage=n()/nrow(refused)*100) %>%
arrange(desc(percentage)) %>%
top_n(15, percentage) %>%
plot_ly(x=~reorder(genus, desc(percentage)), y=~percentage,
color=~genus) %>%
add_bars() %>%
layout(title = "<b>Figure 37:Genus of refused segments (Top 15)</b>",
xaxis= list(title= "<b>Genus</b>" ,tickangle=-65),
yaxis = list(title = "<b>Percentage</b>"))# Port
refused %>%
group_by(port) %>%
summarise(total=n(), percentage=n()/nrow(refused)*100) %>%
arrange(desc(percentage)) %>%
top_n(15, percentage) %>%
plot_ly(x=~reorder(port, desc(percentage)), y=~percentage,
color=~port) %>%
add_bars() %>%
layout(title = "<b>Port of refused segments (Top 15)</b>",
xaxis= list(title= "<b>Port</b>" ,tickangle=-65),
yaxis = list(title = "<b>Percentage</b>"))3.1.5 disposition
It represents the fate of the import
- There are 5 categories: Cleared, abandoned, reexport, seized and others.
- Most of them (98.3%) are cleared.
disposition <-data %>% group_by(disposition) %>%
summarise(total = n(), percentage=n()/nrow(data)) %>%
drop_na(disposition) %>%
arrange(desc(percentage))
DT::datatable(disposition,
caption = htmltools::tags$caption(
style='caption-side: bottom; text-align: center;','Table 4: ',
htmltools::em('Fate of the import'
))) %>%
formatRound('total',1) %>%
formatPercentage('percentage',2)disposition %>% mutate(percentage=percentage*100) %>%
plot_ly(x=~reorder(disposition, desc(percentage)), y=~percentage, color=~disposition) %>%
add_bars() %>%
layout(title = "<b>Fate of the import</b>",
xaxis= list(title= "<b>Dispositions</b>" ,tickangle=-65),
yaxis = list(title = "<b>Percentage</b>"))3.1.6 shipment_date and disposition_date
Shipment_date represents the date when shipment arrived, while disposition_date represents the date when disposition occurred.
54% of dispositions took place within a month of the shipment date (most of them within a week)
- While ‘shipment_date’ entries fell completely within the time period of 2000–2014, ‘disposition_date’ ranged more widely
- Users should be wary of any disposition date values that precede the associated shipment date, as we are unaware of how this could represent an accurate accounting of the product disposition process. However, for many potential analyses, differences in the date fields may not be a significant cause for concern because ‘shipment_date’ alone provides a sound index for those interested in temporal trends in wildlife trade
days<- data %>%
mutate(days = as.factor(as.numeric(disposition_date - shipment_date))) %>%
group_by(days) %>%
summarise(total = n(), percentage=n()/nrow(data)) %>%
filter(days %in% c("0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10",
"11", "12", "13", "14", "15", "16", "17", "18", "19", "20",
"21", "22", "23", "24", "25", "26", "27", "28", "29", "30"))
DT::datatable(days,
caption = htmltools::tags$caption(
style='caption-side: bottom; text-align: center;','Table 5: ',
htmltools::em('Number of dispositions within a month of the shipment date'
))) %>%
formatRound('total',1) %>%
formatPercentage('percentage',2)3.2 Variables related with the countries
3.2.1 country_origin
It represents the code for the country of origin of the wildlife product. Shipments comes from 252 different countries of origin, although 75% of all the segments comes from the same 15 countries.The top-five countries of origin were Indonesia (15%), Canada (10%), South Africa (9%), Philippines (8%) and China (7%). Together, they represent around 50% of all segments.
# Dropping unused levels
data$country_origin <- droplevels(data$country_origin)
country_origin <-data %>% group_by(country_origin) %>%
summarise(total = n(), percentage=n()/nrow(data)) %>%
drop_na(country_origin) %>%
arrange(desc(percentage))
DT::datatable(country_origin,
caption = htmltools::tags$caption(
style='caption-side: bottom; text-align: center;','Table 6: ',
htmltools::em('Country of origin of the wildlife product'
))) %>%
formatRound('total',1) %>%
formatPercentage('percentage',2)country_origin %>% mutate(percentage=percentage*100) %>%
top_n(15, percentage) %>%
plot_ly(x=~reorder(country_origin, desc(percentage)), y=~percentage,
color=~country_origin) %>%
add_bars() %>%
layout(title = "<b>Figure 10: Country of origin of the wildlife product (Top 15)</b>",
xaxis= list(title= "<b>Country of origin</b>" ,tickangle=-65),
yaxis = list(title = "<b>Percentage</b>"))# Country of origin by quantity
# We'll only include in the analysis those items measured in "number" and "kg" units, because they represent 99.45% of the data.
data %>%
filter(unit=="Number") %>%
group_by(country_origin, unit) %>%
summarise(total = sum(quantity)) %>%
arrange(desc(total)) %>%
ungroup() %>%
top_n(15, total) %>%
plot_ly(x=~reorder(country_origin, desc(total)), y=~total,
color=~country_origin) %>%
add_bars() %>%
layout(title = "<b>Figure 12: Most common country of origin (by number) (Top 15)</b>",
xaxis= list(title= "<b>Country of origin</b>" ,tickangle=-65),
yaxis = list(title = "<b>Total</b>"))data %>%
filter(unit=="Kilograms") %>%
group_by(country_origin, unit) %>%
summarise(total = sum(quantity)) %>%
arrange(desc(total)) %>%
ungroup() %>%
top_n(15, total) %>%
plot_ly(x=~reorder(country_origin, desc(total)), y=~total,
color=~country_origin) %>%
add_bars() %>%
layout(title = "<b>Figure 13: Most common country of origin (by Kilograms) (Top 15)</b>",
xaxis= list(title= "<b>Country of origin</b>" ,tickangle=-65),
yaxis = list(title = "<b>Total</b>"))country_origin_top5<- data %>%
filter(country_origin %in% c("Indonesia", "Canada", "South Africa",
"Philippines", "China")) %>%
group_by(country_origin, taxa) %>%
dplyr::summarise(n=n()) %>%
mutate(percentage = n/sum(n)*100) %>%
arrange(country_origin, desc(percentage)) %>%
top_n(3, percentage)
country_origin_top5<- droplevels(country_origin_top5)
plot_ly(country_origin_top5, x=~country_origin,
y=~percentage, group= ~taxa,
type="bar", color=~taxa) %>%
layout(title = "<b>Figure 11: Common taxa by country of origin</b>",
xaxis= list(title= "<b>Country of origin</b>" ,tickangle=-65),
yaxis = list(title = "<b>Total</b>"))3.2.2 country_imp_exp
It represents the code for the country to/from which the wildlife product is shipped. Shipments were exported to the UUEE from 257 different countries, although the top 15 represent 75% of all segments. The top-five countries of export were Canada (12%), Indonesia (12%), South Africa (8%), Philippines (8%) and Italy (7%).
# Dropping unused levels
data$country_imp_exp <- droplevels(data$country_imp_exp)
country_imp_exp <-data %>% group_by(country_imp_exp) %>%
summarise(total = n(), percentage=n()/nrow(data)) %>%
drop_na(country_imp_exp) %>%
arrange(desc(percentage))
DT::datatable(country_imp_exp,
caption = htmltools::tags$caption(
style='caption-side: bottom; text-align: center;','Table 7: ',
htmltools::em('Country to/from which the wildlife product is shipped'
))) %>%
formatRound('total',1) %>%
formatPercentage('percentage',2)country_imp_exp %>% mutate(percentage=percentage*100) %>%
top_n(15, percentage) %>%
plot_ly(x=~reorder(country_imp_exp, desc(percentage)), y=~percentage,
color=~country_imp_exp) %>%
add_bars() %>%
layout(title = "<b>Figure 14: Country that exported the wildlife product to the EEUU (Top 15)</b>",
xaxis= list(title= "<b>country_imp_exp</b>" ,tickangle=-65),
yaxis = list(title = "<b>Percentage</b>"))# Country of export by quantity
# We'll only include in the analysis those items measured in "number" and "kg" units, because they represent 99.45% of the data.
data %>%
filter(unit=="Number") %>%
group_by(country_imp_exp, unit) %>%
summarise(total = sum(quantity)) %>%
arrange(desc(total)) %>%
ungroup() %>%
top_n(15, total) %>%
plot_ly(x=~reorder(country_imp_exp, desc(total)), y=~total,
color=~country_imp_exp) %>%
add_bars() %>%
layout(title = "<b>Figure 16: Most common country of export (by number) (Top 15)</b>",
xaxis= list(title= "<b>Country of export</b>" ,tickangle=-65),
yaxis = list(title = "<b>Total</b>"))data %>%
filter(unit=="Kilograms") %>%
group_by(country_imp_exp, unit) %>%
summarise(total = sum(quantity)) %>%
arrange(desc(total)) %>%
ungroup() %>%
top_n(15, total) %>%
plot_ly(x=~reorder(country_imp_exp, desc(total)), y=~total,
color=~country_imp_exp) %>%
add_bars() %>%
layout(title = "<b>Figure 17: Most common country of export (by Kilograms) (Top 15)</b>",
xaxis= list(title= "<b>Country of export</b>" ,tickangle=-65),
yaxis = list(title = "<b>Total</b>"))country_export_top5<- data %>%
filter(country_imp_exp %in% c("Canada", "Indonesia", "South Africa",
"Philippines", "Italy")) %>%
group_by(country_imp_exp, taxa) %>%
dplyr::summarise(n=n()) %>%
mutate(percentage = n/sum(n)*100) %>%
arrange(country_imp_exp, desc(percentage)) %>%
top_n(3, percentage)
country_export_top5<- droplevels(country_export_top5)
plot_ly(country_export_top5, x=~country_imp_exp,
y=~percentage, group= ~taxa,
type="bar", color=~taxa) %>%
layout(title = "<b>Figure 15: Common taxa by country of export</b>",
xaxis= list(title= "<b>Country of export</b>" ,tickangle=-65),
yaxis = list(title = "<b>Total</b>"))Most active countries are on both sides (country of origin & country of export)
country_origin <- country_origin %>%
mutate(percentage=percentage*100) %>% top_n(4, percentage) %>%
rename(country = country_origin)
country_imp_exp <- country_imp_exp %>%
mutate(percentage=percentage*100) %>% top_n(4, percentage) %>%
rename(country = country_imp_exp)
countries<- combine(country_origin, country_imp_exp)
countries %>%
plot_ly(x=~reorder(country, desc(percentage)), y=~percentage,
color=~source) %>%
add_bars() %>%
layout(title = "<b>Most active countries</b>",
xaxis= list(title= "<b>Country</b>" ,tickangle=-65),
yaxis = list(title = "<b>Percentage</b>"))3.2.3 port
It represents the port of entry Although there are around 328 ports of entry into the EEUU, only 73 of them are represented in this dataset. This is because not all of them are covered by FWS wildlife inspectors. By law, only 18 of them are designated ports with full-time inspectors.
Based on the number of segments, the top-five ports of entry were Los Angeles (20%), New York(19%), Miami(6%), Newark(5%) and San Francisco(5%). Together, they represent 56% of all segments. The top 15 represent 84% of all segments. It’s interesting to point out that the top-five ports of entry are all designated ports.
# Dropping unused levels
data$port <- droplevels(data$port)
port <-data %>% group_by(port) %>%
summarise(total = n(), percentage=n()/nrow(data)) %>%
drop_na(port) %>%
arrange(desc(percentage))
DT::datatable(port,
caption = htmltools::tags$caption(
style='caption-side: bottom; text-align: center;','Table 8: ',
htmltools::em('Port or region of shipment entry'
))) %>%
formatRound('total',1) %>%
formatPercentage('percentage',2)port %>% mutate(percentage=percentage*100) %>%
top_n(15, percentage) %>%
plot_ly(x=~reorder(port, desc(percentage)), y=~percentage,
color=~port) %>%
add_bars() %>%
layout(title = "<b>Figure 18: Port of entry (Top 15)</b>",
xaxis= list(title= "<b>Port</b>" ,tickangle=-65),
yaxis = list(title = "<b>Percentage</b>"))# Ports by quantity
# We'll only include in the analysis those items measured in "number" and "kg" units, because they represent 99.45% of the data.
data %>%
filter(unit=="Number") %>%
group_by(port, unit) %>%
summarise(total = sum(quantity)) %>%
arrange(desc(total)) %>%
ungroup() %>%
top_n(15, total) %>%
plot_ly(x=~reorder(port, desc(total)), y=~total,
color=~port) %>%
add_bars() %>%
layout(title = "<b>Figure 20: Most common ports (by number) (Top 15)</b>",
xaxis= list(title= "<b>Port</b>" ,tickangle=-65),
yaxis = list(title = "<b>Total</b>"))data %>%
filter(unit=="Kilograms") %>%
group_by(port, unit) %>%
summarise(total = sum(quantity)) %>%
arrange(desc(total)) %>%
ungroup() %>%
top_n(15, total) %>%
plot_ly(x=~reorder(port, desc(total)), y=~total,
color=~port) %>%
add_bars() %>%
layout(title = "<b>Figure 21: Most common ports (by Kilograms) (Top 15)</b>",
xaxis= list(title= "<b>Port</b>" ,tickangle=-65),
yaxis = list(title = "<b>Total</b>"))ports_top5<- data %>%
filter(port %in% c("Los Angeles, CA", "New York, NY", "Miami, FL",
"Newark, NJ", "San Francisco, CA")) %>%
group_by(port, taxa) %>%
dplyr::summarise(n=n()) %>%
mutate(percentage = n/sum(n)*100) %>%
arrange(port, desc(percentage)) %>%
top_n(3, percentage)
ports_top5<- droplevels(ports_top5)
plot_ly(ports_top5, x=~port,
y=~percentage, group= ~taxa,
type="bar", color=~taxa) %>%
layout(title = "<b>Figure 19: Common taxa by port</b>",
xaxis= list(title= "<b>Port</b>" ,tickangle=-65),
yaxis = list(title = "<b>Total</b>"))3.2.4 Trade routes
trade_routes <-data %>% group_by(country_imp_exp, port) %>%
summarise(total = n(), percentage=n()/nrow(data)) %>%
drop_na(port, country_imp_exp) %>%
arrange(desc(percentage))
DT::datatable(trade_routes,
caption = htmltools::tags$caption(
style='caption-side: bottom; text-align: center;','Table 8: ',
htmltools::em('Trade routes'
))) %>%
formatRound('total',1) %>%
formatPercentage('percentage',2)routes_top5<- data %>%
mutate(route = paste(country_imp_exp,"-", port)) %>%
filter(route %in% c("Indonesia - Los Angeles, CA",
"Italy - New York, NY",
"South Africa - New York, NY",
"Hong Kong SAR China - New York, NY",
"Philippines - Los Angeles, CA")) %>%
group_by(route, taxa) %>%
dplyr::summarise(n=n()) %>%
mutate(percentage = n/sum(n)*100) %>%
arrange(route, desc(percentage)) %>%
top_n(3, percentage)
routes_top5<- droplevels(routes_top5)
plot_ly(routes_top5, x=~route,
y=~percentage, group= ~taxa,
type="bar", color=~taxa) %>%
layout(title = "<b>Figure 22: Common taxa by route</b>",
xaxis= list(title= "<b>Route</b>" ,tickangle=-65),
yaxis = list(title = "<b>Total</b>"))3.2.5 us_co
It represents the US party of the shipment
- We have excluded the “EXEMPTIONS 6 AND 7(C)” from the analysis.
- There are 126,052 US parties
- The top 15 just represents 10.3% of data
- The top 50 just represents 19 % of data
us_co <-data %>%
filter(!us_co == "EXEMPTIONS 6 AND 7(C)") %>%
group_by(us_co) %>%
summarise(total = n(), percentage=n()/nrow(data)) %>%
drop_na(us_co) %>%
arrange(desc(percentage))
DT::datatable(us_co,
caption = htmltools::tags$caption(
style='caption-side: bottom; text-align: center;','Table 9: ',
htmltools::em('US party of the shipment'
))) %>%
formatRound('total',1) %>%
formatPercentage('percentage',2)us_co %>% mutate(percentage=percentage*100) %>%
top_n(20, percentage) %>%
plot_ly(x=~reorder(us_co, desc(percentage)), y=~percentage,
color=~us_co) %>%
add_bars() %>%
layout(title = "<b>US party of the shipment (Top 20)</b>",
xaxis= list(title= "<b>US party</b>" ,tickangle=-65),
yaxis = list(title = "<b>Percentage</b>"))We want to analyze which American corporations are importing the most. So, we have grouped the mayor corporations based on the company names.
We’ve grouped most of those companies that represent at least 19% of the data, covering approximately 1.000.000 observations.
# Let's summarize and graph these corporations
data_corporations<- data %>%
filter(corp_classif!="Others") %>%
group_by(corp_classif, corporation) %>%
dplyr::summarise(total=n(), percentage=n()/nrow(data)) %>%
arrange(desc(total))
DT::datatable(data_corporations,
caption = htmltools::tags$caption(
style='caption-side: bottom; text-align: center;','Table 10: ',
htmltools::em('Which American corporations are importing the most'
))) %>%
formatRound('total',1) %>%
formatPercentage('percentage',2)## Graph by classification
data_corporations %>% mutate(percentage=percentage*100) %>%
filter(corp_classif == "Fashion/Luxury/Design") %>%
plot_ly(x=~reorder(corporation, desc(percentage)), y=~percentage,
marker = list(color = "coral")) %>%
add_bars() %>%
layout(title = "<b>Figure 23: Fashion/Luxury/Design Corporations</b>",
xaxis= list(title= "<b>American corporation</b>" ,tickangle=-65),
yaxis = list(title = "<b>Percentage</b>"))data_corporations %>% mutate(percentage=percentage*100) %>%
filter(corp_classif == "Animal/animal prod. providers") %>%
plot_ly(x=~reorder(corporation, desc(percentage)), y=~percentage,
marker = list(color = "green")) %>%
add_bars() %>%
layout(title = "<b>Figure 24: Animal/animal prod. providers Corporations</b>",
xaxis= list(title= "<b>American corporation</b>" ,tickangle=-65),
yaxis = list(title = "<b>Percentage</b>"))data_corporations %>%
group_by(corp_classif) %>%
dplyr::summarise(total=sum(total), percentage=sum(percentage)) %>%
mutate(percentage=percentage*100) %>%
plot_ly(x=~reorder(corp_classif, desc(percentage)), y=~percentage,
color=~corp_classif) %>% add_bars() %>%
layout(title = "<b>American corporation classification </b>",
xaxis= list(title= "<b>American corporation</b>" ,tickangle=-65),
yaxis = list(title = "<b>Percentage</b>"))3.2.6 foreign_co
It represents the foreign party of the shipment
- There are 237,994 foreign parties
- The top 15 just represents 5.4% of data
- The top 50 just represents 12.4% of data
foreign_co <-data %>%
filter(!foreign_co == "EXEMPTIONS 6 AND 7(C)") %>%
group_by(foreign_co) %>%
summarise(total = n(), percentage=n()/nrow(data)) %>%
drop_na(foreign_co) %>%
arrange(desc(percentage))
DT::datatable(foreign_co,
caption = htmltools::tags$caption(
style='caption-side: bottom; text-align: center;','Table 11: ',
htmltools::em('Foreign party of the shipment'
))) %>%
formatRound('total',1) %>%
formatPercentage('percentage',2)